M. V. Burnashev, H. Yamamoto 



ON ZERO-RATE ERROR EXPONEN' 
FOR BSC WITH NOISY FEEDBACK 




For the information transmission a binary symmetric channel is used. There is 
also another noisy binary symmetric channel (feedback channel), and the trans- 
mitter observes without delay all the outputs of the forward channel via that 
feedback channel. The transmission of a nonexponential number of messages 
(i.e. the transmission rate equals zero) is considered. The achievable decoding 
error exponent for such a combination of channels is investigated. It is shown 
that if the crossover probability of the feedback channel is less than a certain 
positive value, then the achievable error exponent is better than the similar error 
exponent of the no-feedback channel. 

The transmission method described and the corresponding lower bound for 
the error exponent can be strengthened, and also extended to the positive 
transmission rates. 



The binary symmetric channel BSC(p) with crossover probability < p < 1/2 (and 
q = 1 — p) is considered. It is assumed that there is the feedback BSC(pi) channel, and 
the transmitter observes (without delay) all outputs of the forward BSC(p) channel via that 
noisy feedback channel. No coding is used in the feedback channel (i.e. the receiver simply 
re-transmits all received outputs to the transmitter). In words, the feedback channel is 
"passive" . 

Since the Shannon's paper pQ it has been known that even the noiseless feedback does not 
increase the capacity of the BSC (or any other memoryless channel). However, the feedback 
can improve the decoding error probability (or simplify the effective transmission method). 
In the case of BSC with noiseless feedback investigations of the decoding error probability 
(or its best error exponent - channel reliability function) have been actively studied since 
Dobrushin [2], Horstein [3] and Berlekamp [4j. Some characteristics of a number of efficient 
transmission methods have been investigated (see, for example, [1-10]). Generally, the case 
of BSC with noiseless feedback is reasonably well investigated (although there are still some 
important open problems). 

The case of noisy feedback was not investigated. It was not even known whether such 
feedback can improve the error exponent of the no- feedback case. In this respect, only 
two recent papers [TU [12] can probably be mentioned, but both of them consider different 
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problems. In the paper [TT] the variable-length coding (i.e. non-block codes) is used under 
a different error criterion. Moreover, it is assumed that at certain moments an error-free 
mechanism in the feedback is available. In the paper [12] Gaussian channel with only the 
average power constraint is considered. Such constraint allows using some methods which 
are unavailable in the case of discrete channels. 

We try to explain the reason why the noisy feedback case is so badly investigated, and 
what creates the main difficulty (how we see it). In the noiseless feedback case the transmitter 
at any moment may change its coding function (transmission method), and the receiver will 
know exactly about this change. Such an ideal mutual understanding (mutual coordination) 
between the transmitter and the receiver was very important for all results on the noiseless 
feedback case [1—10]. If we try to apply any of the transmission methods from [1-10] to a 
noisy feedback case, we find that the transmitter and the receiver rather quickly loose their 
mutual coordination. Due to noise in the feedback link they can achieve mutual coordination 
only in some probabilistic sense. In particular, if the transmitter wants to change its coding 
function at some moment t, it should know with high reliability the current output values 
of some functions (e.g. posterior message probabilities) at the receiver. Of course, it takes a 
certain time to achieve high reliability of such knowledge. For that reason, the transmitter 
should probably change the coding function not very often (i.e. only after accumulating 
some very reliable information on the receiver uncertainty). 

The following geometrical picture explains that description. Let Pi, . . . , £>m be the 
optimal decoding regions of messages 6\, . . . ,9m, respectively. The boundary part of each 
region T>i gives the main contribution to the decoding error. The transmitter aim is to 
"push" the output into the corresponding region Pj. The best transmitter strategy is to 
"push" the current output in the direction "orthogonal" to the closest boundary of the true 
region Then, essentially, two cases are possible. 

1) If all T>i are "round-shaped" (i.e. similar to "balls"), then they have the centers, and 
therefore the best transmitting strategy is to send the center of the corresponding "ball" 
(and that strategy does not depend on the output signals). It automatically pushes the 
output in the direction "orthogonal" to the closest boundary. This situation takes place for 
sufficiently high transmission rates R. Then, even noiseless feedback cannot improve the 
error exponent. 

2) The situation becomes quite different if the optimal decoding regions {T>i} are not 
"round-shaped" (and so, they do not have the natural centers). Now the best transmitter 
strategy depends on the current output location. For the case of three messages, it is depicted 
in Fig. 1. Let the message 9\ be transmitted, and then the transmitter pushes the output 
into the region T>\. If the current output is close to the point A (i.e. to two other possible 
regions), then best is to push the output simultaneously away from both competitive regions. 
On the contrary, if the current output is close to the point B (i.e. it is much closer to the 
competitive region T>2 than to T>s), then best is to push the output mainly away from the 
region T> 2 , paying less attention to the other region T> 3 . 

That best strategy is possible only if the transmitter knows exactly the current output 
location (i.e. if there is noiseless feedback). If there is no any feedback then the transmitter 
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knows nothing on the current output location, and there is no sense to change the push 
direction. The situation becomes "fuzzy", if the transmitter knows only approximately the 
current output location (i.e. if there is noisy feedback). 

In this paper we realize those arguments, allowing only one fixed time moment when 
the transmitter may change the coding function. At that moment the transmitter, using 
observations over the feedback channel, finds two messages which are the most probable for 
the receiver. After that the transmitter only helps the receiver to decide between those two 
messages. Of course, an error is possible when choosing those two most probable messages. 
However, we show that if the crossover probability of the feedback channel is less than the 
certain positive value, then the probability of making an error in that choice is sufficiently 
small. Such simple transmission method (together with the properly chosen decoding) allows 
already to improve the decoding error probability in comparison with the no-feedback case. 

Of course, if the feedback channel noise is rather small then it is possible to use a larger 
number of such "switching" moments, and to improve further the error probability exponent. 
In the limit (if the feedback channel noise is very small), using a growing number of switching 
moments, we can achieve the noiseless feedback case performance. 

We consider the case when the overall transmission time n and M = M n equiprobable 
messages {#1, . . . , 9 M } are given. It is assumed that M n — > oo, but lnM n = o{n) as n — > oo, 
i.e. the transmission rate R = 0. After the moment n, the receiver makes a decision 9 on 
the message transmitted. We limit ourselves here only to the case R = 0, since in that 
case the difficulties of using noisy feedback are seen most clearly. In the case of a positive 
transmission rate R (it will be considered in another publication) some additional technical 
difficulties appear, which we want to avoid for a while. It should also be mentioned that the 
investigation of the best error exponent for R = even for the noiseless feedback case is not 
a simple task jlj. 

As a result, we show that if the crossover probability p\ of the feedback channel BSC(pi) 
is less then the certain positive value po{p), then it is possible to improve the best error 
exponent E(p) of BSC(p) without feedback. The transmission method with one "switching" 
moment, giving such an improvement, is described in §3. 

Denote by E{p) the best error exponent for M n codewords over BSC(p) without feedback, 

i.e. 

E(p) = limsup - In — — , lnM n = o(n) , (1) 

where P e (M n ,n,p) is the minimal possible decoding error probability P c for all codes of 
length n. Clearly, we have 

1 1 

E(p) = - In . (2) 

yF! 4 Apq 1 ' 

Indeed, the minimal Hamming distance of any such code does not exceed n/2 (Plotkin 
bound). On the other hand, due to the Varshamov-Gilbert bound there exist codes with 
approximately such minimal distance. If E(R,p) - the reliability function of the BSC(p) 
without feedback, then E(p) = £7(0, p). 
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Denote by ^(p) the best error exponent for two codewords over BSC(p) (it remains the 
same for the channel with noiseless feedback, as well). Clearly, we have 

rw N 1 1 1 

E 2 (p) = - In . 

W 2 4pg 

Denote by F(p) the best error exponent for M n messages over BSC(p) with noiseless 
feedback. It is defined similarly to flTJ, where P e (M n ,n,p) is the minimal possible decoding 
error probability for all transmission methods. Denote also by F 3 (p) the best error exponent 
for three messages over BSC(p) with noiseless feedback. Then jl] 

F(p) = F 3 (p) = - In (p 1 / 3 ? 2 / 3 + qVy/ 3 ) . (3) 

If F(R,p) - the reliability function of such channel, then F(p) = F(0,p). 

Denote by F(p,pi) the best error exponent for M n messages transmitted over the BSC(p) 
with the noisy BSC(pi) feedback channel. Clearly, E(p) < F(p,pi) < F(p) for allp,pi. In 
particular, F(p, 0) = F(p), F(p, 1/2) = E(p). Moreover, E(p) < F(p) < E 2 (p), < p < 1/2. 
Let r(p) = F(p)/E(p). The function r(p) monotonically increases on p, and, in particular, 

r(0) = limr(p) = 4/3, r(0.01) w 1.67, r(l/2) = lim = 16/9 » 1.78 . 

More exactly, if p — (1 — e)/2 then (e — > 0) 

E(p) = e 2 /A + 0(e 4 ), F{p) = Ae 2 /9 + 0(e 4 ). 

Below in the paper / ~ g means n _1 In / = n -1 In (7 + o(l), n — » 00, and f < g means 
n~ x In/ < rT x lng + o(l), n — > 00. 

To formulate the paper main result, introduce the functions: 

h(x) = — xlnx — (1 — x) ln(l — x), 
z = q/p, z l = q l /p l , 

3Gi (t, p) = In — - — max {2h(a) + h(a + t) + (a + t) In z) , 

qp 2 a 



3G 2 (t,p,pi) = (2c + t) In 2 - h(c + 1) - h(c ) + [2 + t - 2(1 + t)6i] lnz r 

"(l + t)6i — t" 



-(1 + - (1 

c (*,p) 
6i(t,Pi 



1-t 
2(1 -t) 



2 ln(gg 



1; > 



(4) 



2 + 1(^ 2 - 1) + ^Az 2 + t 2 {z 2 - l) 2 
2^ 2 



(2 + t)^ 2 - t + yjAz\ + (z 2 - 1)H 2 
The optimal ao = ao (p, t) in (HD is defined as the unique root of the equation 

q{l- a) 2 {l- a-t) = pa 2 (a + t). (5) 



4 



We have ao(p, 0) = ao(p), where ao(p) is the same as defined below in (|T5|) . 
Introduce also the function po(p) as the unique root of the equation 

3G 2 (l/2-p,p,po) = ln-^, 0<p<l/2. (6) 

Apq 

Denote by Fi(p,pi) the error exponent for the transmission method with one switching 
moment, described in §3. Clearly, Fi(p,pi) < F(p,pi) for allp,pi. The paper main result is 

Theorem. If pi < Po(p), then 

jp, \ ^ P / \ 6min{G l (t,p),G 2 (t,p,p 1 )}E(p) 

F{p,Pi) > Fi(p,pi) = max , — — — — > E (p . (7) 

t 3mm{Gi(t,p),G 2 (t,p,Pi)} + 4E(p) 

The function Gi(t,p) monotonically decreases on t, and Gi(0,p) = F(p). On the other 
hand, the function G 2 (t,p,pi) monotonically increases on t. Moreover, G 2 (0,p, pi) = 0, and 
G 2 (t,p,0) = oo, t > 0. 

The function po(p) ; < p < 1/2, is positive and monotonically increases on p. Its plot is 
shown in Fig. 2. 

Example 1. Consider the case p — > 0. Then 

Po(p) = ^ (l + o(l)). 

The approximation po(p) ~ p/2 is quite accurate for p < 0.01. 

Example 2. Consider the opposite asymptotic case p = (1 — e)/2, e — > 0, and 
i < 1/2 — p = e/2. Then a = a (p, t) = 1/2 — p, p 0, and after standard algebra we get 

P = ^ + 0(e 2 ), 
o 

which gives 

G l( t, P ) = ^-f + t2 K o^), 



If Gi(t,p) = G 2 (t,p,Pi), then 



V3 - 12pigi + 

which gives 



+ 0(s 2 



Ae 2 

mm {G 1 {t,p),G 2 {t,p,p l )} = — + O (e 3 ) . 

3 [V3 - l2p lQl + 2 VPigl] 
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The condition t < e/2 is equivalent to the inequality 16pigi < 1, which means that 

1 1 



lim Pq(p) = - r- ~ 

p -+i/2 m ^> 4(2 + ^3) 14.93 



0.067. 



For p 1 — > we get 



F(p,pi) > 



8E(p) 
7 



7 



+ 



49 



- + (p/ 



(8) 



In words, for small p\ the strategy described in §3 gives 14% gain over the no-feedback 
channel. 



Corollary. If pi = 0, then 



(9) 



Example 3. We have Fi(p,pi) — > Fi{p, 0) as pi — > 0. We investigate the rate of that 
convergence since it gives some idea on when the noisy feedback behaves like the noiseless 
feedback. If p\ — > 0, then the optimal t — *■ 0. For a fixed < p < 1/2 and t — > for the root 
a(t,p) of the equation ([5]) we have 



which gives 



a(t,p) = a (p)- t - + O(t 2 ) 



Of 

Gl (t,p) = F(p)--\nz + 0(t 2 ). 



We also can get as pi, t — > 



co(t,p)= P -! + V^ + 0(t 3 ) 
2 8gp 



= 1-t + O 



Pi 



Pi+t 



o(e) 



which gives 

3G 2 (t,p, Pl ) = -t\np 1 + 0(p 1 lnpi) + O (tint) + (t 2 \n Pl ) . 
If Gi(t,p) = G 2 (*,P,Pi), then 



and 



In(lM) 

min{G , i(t,p),G 2 (t,p,Pi)} = F(p) 



2\nz ( 1 

1 - _ +o 



31n(l/pi) V ln (VP 
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As a result, we get as p% — > 

F 1 (p,p 1 ) = F 1 (p, 0) 

Remark 1. The transmission method described in §3, reduces the problem to testing of 
two most probable messages (at the fixed moment). Such strategy is not optimal even for 
one switching moment. But it is relatively simple for investigation, and it gives already a 
reasonable improvement over the no-feedback case. 

In § 2 the transmission method with one switching moment for the channel with noiseless 
feedback is described and investigated. In particular, the formula is proved. In § 3 that 
transmission method (slightly modified) is investigated for the channel with noisy feedback, 
and the theorem is proved. In §4 the simple transmission method with active feedback is 
considered. 

The preliminary (and simplified) paper version (without detailed proofs) for M = 3 
messages was published as [13J. 

§2. Channel with noiseless feedback. Proof of the formula ([9]). 

We start with the noiseless feedback case and describe the transmission method which 
will be used for noisy feedback as well. Moreover, in the noisy feedback case we will need 
some formulas from that case. 

Consider the BSC(p) with noiseless feedback and M messages 9\, . . . ,9m- We assume 
that M n — > oo, but lnM n = o{n) as n — > oo. We set some 7 G [0, 1] (it will be chosen later) 
and divide the total transmission period [0, n] on two phases: [0, 771] (phase I) and (777, n] 
(phase II). We perform as follows: 

1) On phase I (i.e. on [0, 772,]) we use a code of M codewords such that d (a;*, Xj) = 
7n/2 + o(n), i 7^ j (existence of such "almost" a simplex code can be shown using random 
choice of codewords). On that phase the transmitter only observes via the feedback channel 
outputs of the forward channel, but does not change the transmission method. 

2) Let x be the transmitted codeword (of length 777) and y be the received (by the 
receiver) block. After phase I, based on the block y, the transmitter selects two messages 
9i,9j (codewords Xi,Xj) which are the most probable for the receiver, and ignore all the 
remaining messages {9k}- Then, on phase II (i.e. on (771,72-]) the transmitter helps the 
receiver only to decide between those two most probable messages 9i, 9j, using two opposite 
codewords of length (1 — 7)71. After moment n the receiver makes a decision between those 
two remaining messages 9i,9j (based on all received on [0, n] signals). 

Clearly, a decoding error occurs in the following two cases. 

1) After phase I the true message is not among two most probable messages. We denote 
that probability P\. 

2) After phase I the true message is among two most probable, but after phase II the 
true message is not the most probable. We denote that probability P 2 - 



8E\nz f 1 \ 

3(4£ + 3F) ln(l/ Pl ) + ° Vln(lM) ) 
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Then for the total decoding error probability P e we have 

P c <Pl + P2- (10) 

To evaluate the probabilities Pi and P2, without loss of generality, we assume that the 
message Q\ is transmitted. We start with the probability P\. Denote d(x,y) the Hamming 
distance between x and y, and = d (x^, y). Then 

Pi< p { dl - max^dj}^!}. (11) 

i>j>l 

We use the following auxiliary result (see proof in Appendix). 

Lemma. 1) Let X\, X2, x 3 be the codewords of length m. Denote djj = d (x iy Xj) , 
di = d(xi,y). Assuming that dyi = d%3 = d 23 = 2m/3 + o(m), m — > oo ; consider the 
probability 



2tm *2tiTn 
Px(t, t 1 ) = P[d 2 = d l + — + o(m); d 3 = d 1 + + o(m) 



x 1 



Then 



where 



-lnP 1 (t,t 1 )=\n(p 2 q) + f(t,t 1 ) + o(l), \t\<l, \h\ < 1 , (12) 
m 



(13) 



/(Mi) = max/(a,t,ti) = /(a ,Mi), 

a 

/(a, t, ti) = h(a) + h(a + t) + h(a + ti) + (a + 1\ + 1) In z , 
and ao = ao(t, ti) is the unique root of the equation 

,, , 1 — a , 1 — a + 1 . 1-a-ti . 

/„ = In h In h In + In z = . 

a a — t a + t\ 

The function f(t,ti) monotone increases on ti < (1 — 2p J r t)/2, and monotone decreases on 
h > (l-2p + t)/2. 

2) For an?/ |i| < 1 ana 1 ti < (1 — 2p + £)/2, we aave 



^ 1 , , 2tm , , 2t 1 m 
P [d 2 <d 1 + —;d 3 <d 1 + 



Xl ) = p x (t, ^e ^ , m -> 00 . (14) 



3 ' d " x 3 

Note that the number of summation terms in the right-hand side of ffTTj) does not exceed 
M 2 = e°( n '. Any three codewords Xi,Xi,Xj have the effective length m = 37n/4 + o(n) 
(on the remaining 7n/4 + o(n) positions they have equal coordinates) and mutual distances 
d (xk, xi) = 2m/3 + o(m), k ^ I. Then using the formulas (TT3"]) and ffl~4|) with t = ti = 0, we 
have 

mT 1 In Pi = -ln(p 2 g) + -max{3/i(a) + a In 2;} + o(l) = 
= In (pVS^ + p2/3gV3) = _ F (p) + (i) ; 
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where F{p) is defined in (J3j), and the optimal a = a Q is given by 



1/3 



As a result, from (TTTT) we get 



1 3 

In — = -iF{p)n + o(n) , < p < 1/2 . (16) 
Pi 4 

Remark 2. Let {#1, cc 2 , x%} be a simplex code of length n. Then 
- -lnP{d(cci,j/) > max{d(a; 2 ,j/) ,d(£c 3 ,y)}|cci} = F 3 (p) + o(l) , n -> oo . (17) 

It explains the meaning of the value P(p) = F 3 (p). 

Now we evaluate the probability P 2 . On phase I (of length 771) all the distances among 
codewords are equal to 7n/2 + 0(71). On phase II (of length (1 — 7)71) the distance between 
two remaining codewords equals (1 — 7)71. Therefore the total distance between the true and 
any concurrent codeword equals (1 — 7/2)72. Therefore 

P 2 < MP{error when testing two codewords on distance (1 — 7/2)71}, 

and then 

- In P 2 = (1 ~ 7/2) ln(4 M ) + o(l) = -(2 - 7 )£(p) + o(l) . (18) 
71 2 

As a result, from (IT01) . f[T6"j) and ffTgj) for the decoding error probability P c we have 

11 f 3 1 

-lnP c < - max {In Pi, In P 2 } < -mkW ->yF(p), (2 - y)E(p) \ + o(l) . 

?7< I 4r I 



We choose 7 = 70 such that Pi = P 2 , i.e. set 

8E(p) 



To 



4E?(p) + 3P(p) 



and then for < p < 1 /2 get the formula (Q . 
lfp= (1 -e)/2, £ -> 0, then 

F(p,0)-^E7(p), p-1/2, 

i.e. such strategy with one switching moment gives 14% gain over the no-feedback case (the 
best strategy without limit on the number of switching moments gives 78% gain). 
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§ 3. Channel with noisy feedback. Proof of theorem 



In the noisy feedback case, still using one switching moment, we will slightly modify the 
transmission method from §2 (especially, its decoding method). 

Transmission. Again we set a number < 7 < 1. On phase I, of length 771, we use 
an "almost" simplex code. Let x be the transmitted codeword (of length 771), y be the 
received (by the receiver) block, and x' be the received (by the transmitter) block. Based on 
the transmitted codeword x and the received block x', the transmitter selects two messages 
6i, 6j which look most probable for the receiver. 

If the true message is among those two selected messages 8i,9j, then, on phase II (i.e. 
on ( / ~fn,n\) the transmitter uses the two opposite codewords of length (1 — 7)71 to help the 
receiver to decide between those two most probable messages. For example, the transmitter 
uses all-zeros and all-ones codewords. 

If the true message is not among two selected messages 9i,0j, then, on phase II the 
transmitter sends an intermediate block (say, half-zeros and half-ones). In any case, such 
event will be treated as an error. 

Decoding. We set an additional number t > 0. Arrange the distances {d(xi,y), i = 
1, . . . , M} in the increasing order, denoting 

= mmd(xi,y) < S 2) < ...< S M) = maxd(^, y), 

i i 

(in case of tie we use any order). Let also x 1 , . . . ,x M be the ranking of codewords after 
phase I, i.e x l is the most probable codeword, etc. There are possible two cases. 

Case 1. If d^ < d& + fryn/2, then the receiver makes the decoding immediately after 
phase I (in favor of the closest to y codeword). Although the transmitter still continues 
transmission, the receiver has already made its decision. 

Case 2. If d^ > d^ +£7/1/2, then after phase I the receiver selects two most probable 
messages 9i,9j, and after transmission on phase II (i.e. after moment n) makes a decision 
between those two remaining messages 9i, 9j in favor of more probable of them. 

In order to perform in agreement with the receiver, in the case 2 it is important that 
the transmitter can correctly identify two messages 9i, 9j which are most probable for the 
receiver. Of course, an error in such selection is possible, but its probability should be 
sufficiently small (which will be secured below). 

Remark 3. We separate the case 1 since after phase I, with relatively high probability 
the second x 2 and the third x 3 ranked codewords will be approximately equiprobable, and 
then it will be difficult to the transmitter to rank them correctly. But in that case (with 
high probability) the first message x 1 will be much more probable than x 2 and x 3 . 

To evaluate the decoding error probability P e , denote P\ and P 2 the decoding error 
probability in the case 1 (i.e. after phase I), and in the case 2 (i.e. after the moment n) 
for the noiseless feedback channel, respectively. Similarly, denote P 2 n the decoding error 
probability in the case 2 for the noisy feedback case. Then for the decoding error probability 
P c we have 

Pe<Pl + P2 + P 2 n. (19) 
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We evaluate the probabilities Pi, P2, Pin in the right-hand side of (jT9l) . For P\ we have 

Pi < M 2 (P u + P 12 ) , (20) 

where 

P11 = P(^ 2 < <*i < d 3 < d 1 + *7n/2|xi), 
P12 = P (c?i > max{(i2, d 3 }|cci) 

and c?j = <i(aij, y), i = 1, . . . , M. 

The value Pyi was already estimated in ( [161) (denoted there Pi). The main contribution 
to Pi is given by the value P\\. To evaluate P\\ it is sufficient to consider the case when the 
codewords Xi,x 2 ,x% have length m = 3772/4 (on the remaining 7n/4 positions they have 
equal coordinates) and mutual distances d (xi, Xj) = 2m/3, % ^ j. Then from (1T41) we have 



2tm 

P11 < P [d 2 < di,d 3 <d 1 ' 



3 

For the value Pi(0,t) we get from (fT4"|) and ffT3]) 



x^j e o(n) = Pi(0,t)e o(n) . 



(22) 



^^=C l( t,p) + (l), t<i- P> (23) 
where Gi(t,p) is defined in 01]). Moreover, 

m Pi(0,t) 3 4pq 3 2 

The function Gi(t,p) monotonically decreases on t < 1/2— p. Moreover, Gi(0,p) = F(p). 
For t > 1/2 — p the value Pi(0, t) is essentially defined only by the event {d\ > d 2 }. 
Since P 12 < Pn, we get from (1201), (T161) and (f23|) 



JLln-I = + o(l), t<^-p- (25) 

37n Pi 2 

For the value P2 the formula (Tl8|) remains valid. 

It remains us to evaluate P2 n , which is the probability that the true codeword X\ is among 
two most probable codewords for the receiver, but it is not such one for the transmitter. 
Introduce the random event 

^ _ ( d(x 3 , y) > max{d(x 1 , y),d(x 2 , y)} + t>yn/2; 1 ^ 
\ d(x 3 ,x') < max{d(xi,x'),d(x 2 ,x')} J' 

Then 

Pin < M 2 P(A\ Xl )e o{n) . 
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d(x 3 , y) > d(x 2 , y) + tjn/2; 
d{x 3 , x') < d(x 2 , x') 



(27) 



To evaluate P(^4|a;i) it is convenient to use two related random events 

A= f d(x 3 ,y)>d(x 2 ,y)+t^n/2; 1 
[ d(x 2l x') < d(x 3l x') < d(xi,x') J 

Since A C Ai [JA 2 , we have 

P(^|^i) < PGAilai) + P^l^). (28) 

We may assume that the codewords Xi,x 2 ,x 3 have length m = 3^njA (on the remaining 
jn/A positions they have equal coordinates) and mutual distances d (Xi, Xj) = 2m/ 3, i ^ j. 
All blocks Xi,x 2 ,Xs,y,x' are shown in Fig. 3, where a, b, c, a±, a 2 , bi, b 2 , ci, c 2 denote the 
fractions of l's in the corresponding parts of the received blocks y and x'. Then in addition 
to the formulas (H6|) (see Appendix) we have 

d(xi, x') = [aai + (1 — a)a 2 + bbi + (1 — b)b 2 + cc\ + (1 — c)c 2 ]m/3 , 
d(x 2} x') = [a(l - ai) + (1 - a)(l - a 2 ) + 6(1 - 6i) + (1 - 6)(1 - b 2 ) + cc 1 + (1 - c)c 2 ]m/3 , 
d(aj 3 , a/) = [o(l - oi) + (1 - a)(l - a 2 ) + bh + (1 - 6)6 2 + c(l - Ci) + (1 - c)(l - c 2 )]m/3 , 
a;') = [a(l - a x ) + (1 - a)a 2 + 6(1 - b x ) + (1 - 6)6 2 + c(l - ci) + (1 - c)c 2 ]m/3 . 

We start with the probability P(^4i|a?i). Since 

d(x 3 , y) > d(x 2 , y) + t^n/2 <^ 6 > c + t, 
d{x 3 , x') < d(x 2 , x') <^> ccx + (1 - c)c 2 > bb x + (1 - 6)6 2 , 

for P(y4i|a;i) we have with z = q/p, Z\ = qi/pi (omitting the parts, where x 2 ,x 3 coincide 
on all positions) 



P^lxi) = {qqi) 2m/3 max {AB} [1 + o(l)] < (qqi) 2m/3 max A ■ max 5 [1 + o(l)] , (29) 

b,...,C2 b,...,C2 b,...,c 2 ' 

where 

\bm/3J \cm/3J 

( bm/3 \ / (1 - 6)m/3 \ / cm/3 \ / (1 - c)m/3 \ -^.aj'Wa (30) 
' ~ V6i6m/3 y V6 2 (l - &W 3 / W™/ 3 / W 1 - c)m/3 J Zl 

5{y, x') = 6(1 - h) + (1 - 6)6 2 + c(l - ci) + (1 - c)c 2 , 

and where maximum is taken provided 

b ~ C + t > (31) 
cci + (1 - c)c 2 > 66i + (1 - 6)6 2 . v ; 
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From the definition (1271) of the set A\ it is clear that maximum of {AB} in (!29j) is attained 
when there are equalities in both relations (13TI) . Moreover, there is no loss when we maximize 
the values A, B separately. Then we have 

3m _1 lnP(^i|a;i) < 21n(ggi) + max / + max g + o(l) , (32) 

where 



/ = 3m" 1 In A = h{b) + h(c) - (b + c) hiz , 
g = 3m- 1 lnB = bh(b x ) + (1 - b)h{b 2 ) + ch{c x ) + (1 - c)h(c 2 ) - 5{y, x') In z x 

and where maximum is taken provided 

b = c + t , 
cci + (1 - c)c 2 = bb x + (1 - b)b 2 . 

Note that both functions /, g are fl-concave on all variables. 
For the maximum of / we have 

max / < max / = max{/i(c) + h(c + 1) — (2c + t) In z} = 

j23j b= c +t c 

= h(c +t) + h(c ) - (2c + t)lnz, 



(33) 



(34) 



(35) 



where co(t,p) is defined in (jlj). In fact, there is equality in (1351) . 

To maximize the function g we use the standard Lagrange multipliers. Then for the 
optimal parameter values we get 

1 h 1 7 7 l-(l+t)6! 

ci = 1 - o 2 , c 2 = 1 - bi , 6 2 = YZTf ' 

where b\ = bx(t,pi) is defined in §4§. It gives 

[l + t)b!-t 



max # = (1 + + (1 - t)h 

Note that (since z\ > 1) 



1-t 



[2 + t- 2(1 + t)6i] In z x . (36) 



2 + t-2(l + t )6l = ±_±MzM >Q, 

2 + + - l)2 t 2 

Therefore from (!32|) . (!35|) and (!36j) we get 

lnP(A|a5i) = -G 2 (^P,Pi)m + o(n) , (37) 
where G 2 (t,p,pi) is defined in fll]). 
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Finally consider the probability P(^4.2|^i) from ([27]) . (I28I) . We show that 

lnP(^ 2 |a!i) < lnP(Al^i) + o(n) . (3? 

For that purpose introduce the random events 

C = {d(x 3l y) > d(x 2 , y) + t^n/2] , 
V = {d(x 2 , x') < d(x 3 , x') < d(xi, x')} 

and 

d = {d(x 3 , y) = d(x 2l y) + tjn/2 + o(n)} , 
T>i = {d(x 2 , x') = d(x 3 , x') + o(n) = d(xi, x') + o(n)} . 
Then A 2 = C fl T>, and we have for any t > 

p {A 2 \x{) = p (c n v\xx) ~ p (Ci n Vi\xi) < 

<P({d(x 3 ,x') < d(x 2 ,x')}nC 1 \x 1 ) ~ P(A|^i), 



which proves the inequality ( 1381) . 

As a result, from ( 1281) . ( 1371) and ( |38l) we have 

-lnP 2n = -^G 2 (t,p, Pl )+o(l)- (39) 
n 4 

For the decoding error probability P e from ( fT9j) . ( 1251) . ( fl~8l) and ( 1391) we get 



(40) 



-In — = max min -| —m.m{G 1 (t,p),G 2 (t,p,p 1 )},(2-^f)E(p) 
n F e j,t [ 4 

6 min {d (t, p) , gg (*, P, Pi ) } g(p) 
" m f X 3 min {G 1 (t, p) , G 2 (t, p, p x ) } + 4£(p) ' 

where we set 

8E(p) 

7 ~ 3mm{G 1 {t,p),G 2 {t,p,p l )} + AE{p) ' 
The right-hand side of (1401) exceeds E(p), if for some t the inequality holds 

3mm{G 1 (t,p),G 2 (t,p,p 1 )}>4E(p). (41) 

Moreover, t < 1/2 — p (otherwise, 3Gi(t,p) = 4E(p)). Since G 2 {t,p,pi) monotonically 
increases in t, in order to have the inequality (14 ip fulfilled, we need to have 3G 2 (l/2 — 
p,p,Pi) > AE{p). Therefore introduce the function po(p) as the unique root of the equation 
([6]). Then for any p\ < po(p) and some t < 1/2 — p the inequality (T4T1) is fulfilled, and 
therefore the right-hand side of ( 1401) exceeds E(p). As a result, from ( 1401) we get the formula 
(T7J), which proves the theorem. □ 
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§ 4. Channel with active feedback. Example 



Using of coding in the feedback channel enlarges transmission possibilities. As an exam- 
ple, we consider the simplest of such transmission methods, proposed by G.A. Kabatyansky. 
The transmitter and the receiver will send information by turns. 

We set some numbers 7,71 > 0, such that 7 + 71 < 1, and divide the total transmission 
period [0, n) on intervals [0,771], (jn, (7 + 7i)n] and ((7 + 7i)n, n]. We call those intervals 
phases I, II and III, respectively. 

The transmitter will send information on phases I and III, while the receiver will send 
information only on phase II. On phase I of length yn we use "almost" a simplex code. After 
phase I, based on the received block y, the receiver selects two most probable messages. 
Then, during the phase II of length 7in, it informs the transmitter on those two messages. 
On phase III, the transmitter uses two opposite codewords of length (1 — 7 — ji)n to help 
the receiver to decide between those two most probable messages. 

A decoding error occurs in the following three cases: 

1) After phase I the true message is not among two most probable messages. We denote 
that probability Pi. 

2) After phase I the true message is among two most probable, but on phase II the 
decoding error occurs on the transmitter. We denote that probability Pi- 

3) After phase II the transmitter identified correctly two most probable messages (and 
the true message is among them), but after phase III the true message is not the most 
probable one among two possible messages. We denote that probability P3. 

Then for the decoding error probability P e we have 

Po < Pi + P2 + P3 ■ (42) 
Similarly to § 3, for the values Pi, P2, P3 in the right-hand side of fT4"2"j) we have (as n — > 00) 

n Pi 4 

-In -^=71^1) +o(l), (43) 
n P 2 



1 



n 



lnP 3 = (2-7-27i)P(p) + o(l) 



We choose parameters 7, 71 such that the values Pi, P2, P3 become equal, i.e. we set 

3yF{p) 8E(p) 
71 = - =7 : , 7 ~ 



AE(pi) 1 3F(p) + AE(p) + 6F(p)E(p)/E(pi) 
Then we get 

Proposition. For the decoding error probability P e of the transmission method 
described the relation holds 

1 1 E ^ 

n n P " 1/2 + 2E(p)/(3F(p)) + E{p)/E{pi) + ' {U) 
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Since E(p)/F(p) — > 3/4, p — > 0, such transmission method, essentially, does not improve 
E(p) for small p (and any pi). 

But if p = (1 - e)/2, £ -»• 0, then E(p)/F(p) -> 9/16, p -»• 1/2, and flg) takes the form 

n ln fi^ 7/8 + g 0»)/ g (p 1 ) +< " 1) - (45) 

In that case, such transmission method improves E(p), if E(pi) > 8E(p). In particular, if 
Pi = (1 — £i)/2, £i — > 0, then E(p)/E(p 1 ) ps £ 2 /^?- Therefore the right-hand side of fj45l) 
is better than E(p), if > £\/8. It is better than the relation ((7|) (where it is demanded 
pi < 0.067). 
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APPENDIX 

Proof of lemma. Since the part 2) follows from the part 1), it is sufficient to 
prove the part 1). To simplify formulas we assume that d 12 = d 13 = d 23 = 2m/3 (i.e. that 
{x{} is a simplex code). Such codewords x±, x 2 , x 3 are shown in Fig. 4, where a, b, c denote 
the fractions of l's in the corresponding parts of the received block y. Since 



di = d(x\, y) = (a + b + c)m/3 , 
d 2 = d(x 2 , y) — (2 + c - a - b)m/3 , 
d 3 = d(x 3 , y) — (2 + b - a - c)m/3 , 



for the corresponding random events we have 



{d 2 = dx + 2tm/3} & {a + b = 1 - t}, 
{d 3 = di + 2tim/3} <^ {a + c = 1 - h}. 



Therefore 



A(t,ti)~g m max 

a+b=l-t 
a+c=Y—t\ 



m/3\ ( m/3\ ( m/3 



ami 



/3 J \bm/3 J \cm/3 



-(a+b+c)m/3 



and then 



where 



m 



\nP 1 (t,t 1 ) =\n(p 2 q) + max f(a, t, ti) + o(l) , 



f(a,t,ti) = h{a) + h{a + t) + h(a + t ± ) + (a + t x + t) \nz, 
f" 

J a 



ft = ^ 



1 — a 1 — a — t 1 — a — ti 

In h In h In \-mz, 

a 

1 — a — t 

a + t 



a + t 

+ \nz, /'=ln 



a + ti 
1 — a — ti 

a + ti 



+ \nz. 



(46) 



The function /(a, t, t±) is fl-concave on all arguments. Therefore, the function max /(a, t, t\) 

is attaii 
t)/2, ^ 



(and similar ones) is also fl-concave on all arguments. In particular, max /(a, t, t\) is attained 

a,t,t\ 



2p. Similarly, max f(a,t,t\) is attained for a 

a,t\ 



for a = p, t — ti — 1 
(1 — 2p + 0/2. Then we get the part 1) of the lemma 



□ 



17 



REFERENCES 



1. Shannon C. E. The Zero Error Capacity of a Noisy Channel // IRE Trans. Inform. 
Theory. 1956. V. 2. ? 3. P. 8-19. 

2. Dobrushin R. L. Asymptotic bounds on error probability for message transmission in 
a memoryless channel with feedback // Probl. Kibern. No. 8. M.: Fizmatgiz, 1962. 
P. 161-168. 

3. Horstein M. Sequential Decoding Using Noiseless Feedback / / IEEE Trans. Inform. 
Theory. 1963. V. 9. ? 3. P. 136-143. 

4. Berlekamp E. R., Block Coding with Noiseless Feedback, Ph. D. Thesis, MIT, Dept. 
Electrical Enginering, 1964. 

5. Burnashev M. V. Data transmission over a discrete channel with feedback: Random 
transmission time / / Problems of Inform. Transm. 1976. V. 12, ? 4. P. 10-30. 

6. Burnashev M. V. On a Reliability Function of Binary Symmetric Channel with 
Feedback // Problems of Inform. Transm. 1988. V. 24, ? 1. P. 3-10. 

7. Pinsker M. S. The probability of error in block transmission in a memoryless Gaussian 
channel with feedback / / Problems of Inform. Transm. 1968. V. 4, ? 4. P. 3-19. 

8. Schalkwijk J. P. M., Kailath T. A Coding Scheme for Additive Noise Channels with 
Feedback - I: No Bandwidth Constraint // IEEE Trans. Inform. Theory. 1966. V. 12. 
? 2. P. 172-182. 

9. Tchamkerten A., Telatar E. Variable Length Coding over an Unknown Channel // 
IEEE Trans. Inform. Theory. 2006. V. 52. ? 5. P. 2126-2145. 

10. Yamamoto H., Roh R. Asymptotic Performance of a Modified Schalkwijk-Barron 
Scheme for Channels with Noiseless Feedback / / IEEE Trans. Inform. Theory. 1979. 
V. 25. ? 6. P. 729-733. 

11. Draper S. C, Sahai A. Noisy Feedback Improves Communication Reliability // Proc. 
IEEE International Symposium on Information Theory. Seattle, WA, July 2006, P. 
69-73. 

12. Kim Y.-H., Lapidoth A., Weissman T. The Gaussian Channel with Noisy Feedback 
/ / Proc. IEEE International Symposium on Information Theory, Nice, France, June 
2007, P. 1416-1420. 

13. Burnashev M. V., Yamamoto H. On BSC, Noisy Feedback and Three Messages // 
Proc. IEEE Int. Sympos. on Information Theory. Toronto, Canada. July, 2008. P. 
886-889. 



18 



Burnashev Marat Valievich 

Institute for Information Transmission Problems RAS 

burn@iitp.ru 

Yamamoto Hirosuke 

The University of Tokyo, Japan 

hirosukeOieee . org 



19 




Fig 1. Decoding regions Vi,T>2,'D 3 and directions of output drives 



20 




0,1 0,2 0,3 0,4 0,5 

P 
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Fig 3. Blocks Xi, x 2 , x 3 , y, x' 



22 



x 1 


nn 
uu 


nn 
uu 


nn 
uu 


x 2 


11 


11 


uu 


x 3 


11 


00 


11 


y 


a 


b 


C 



m/3 2m/3 m 



Fig 4. Blocks X!,x 2 ,x 3 , y 
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